Learning Finite-State Controllers for Partially Observable Environments

نویسندگان

Nicolas Meuleau

Leonid Peshkin

Kee-Eung Kim

Leslie Pack Kaelbling

چکیده

Reactive (memoryless) policies are sufficient in completely observable Markov decision processes (MDPs), but some kind of memory is usually necessary for optimal control of a partially observable MDP. Policies with finite memory can be represented as finite-state automata. In this paper, we extend Baird and Moore’s VAPS algorithm to the problem of learning general finite-state automata. Because it performs stochastic gradient descent, this algorithm can be shown to converge to a locally optimal finitestate controller. We provide the details of the algorithm and then consider the question of under what conditions stochastic gradient descent will outperform exact gradient descent. We conclude with empirical results comparing the performance of stochastic and exact gradient descent, and showing the ability of our algorithm to extract the useful information contained in the sequence of past observations to compensate for the lack of observability at each time-step.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Induction and learning of finite-state controllers from simulation

We propose a method to generate agent controllers, represented as state machines, to act in partially observable environments. Such controllers are used to constrain the search space, applying techniques from Hierarchical Reinforcement Learning. We define a multi-step process, in which a simulator is employed to generate possible traces of execution. Those traces are then utilized to induce a n...

متن کامل

Learning Finite State Controllers from Simulation

We propose a methodology to automatically generate agent controllers, represented as state machines, to act in partially observable environments. We define a multi-step process, in which increasingly accurate models generally too complex to be used for planning are employed to generate possible traces of execution by simulation. Those traces are then utilized to induce a state machine, that rep...

متن کامل

Supervisor Synthesis of POMDP based on Automata Learning

As a general and thus popular model for autonomous systems, partially observable Markov decision process (POMDP) can capture uncertainties from different sources like sensing noises, actuation errors, and uncertain environments. However, its comprehensiveness makes the planning and control in POMDP difficult. Traditional POMDP planning problems target to find the optimal policy to maximize the ...

متن کامل

Analyzing and Escaping Local Optima in Planning as Inference for Partially Observable Domains

Planning as inference recently emerged as a versatile approach to decision-theoretic planning and reinforcement learning for single and multi-agent systems in fully and partially observable domains with discrete and continuous variables. Since planning as inference essentially tackles a non-convex optimization problem when the states are partially observable, there is a need to develop techniqu...

متن کامل

Learning and Planning in Multiagent POMDPs Using Finite-State Models of Other Agents

My thesis work provides a new framework for planning in multiagent, stochastic, partially observable domains with little knowledge about other agents. The relevance of the contribution lays in the variety of practical applications this approach can help tackling, given the very generic assumptions about the environment and the other agents. In order to cope with this level of generality, Bayesi...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1999

Learning Finite-State Controllers for Partially Observable Environments

نویسندگان

چکیده

منابع مشابه

Induction and learning of finite-state controllers from simulation

Learning Finite State Controllers from Simulation

Supervisor Synthesis of POMDP based on Automata Learning

Analyzing and Escaping Local Optima in Planning as Inference for Partially Observable Domains

Learning and Planning in Multiagent POMDPs Using Finite-State Models of Other Agents

عنوان ژورنال:

اشتراک گذاری